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An important endpoint variable in a cocaine rehabilitation study 
is the time to first relapse of a patient after the treatment. We pro¬ 
pose a joint modeling approach based on functional data analysis 
to study the relationship between the baseline longitudinal cocaine- 
use pattern and the interval censored time to first relapse. For the 
baseline cocaine-use pattern, we consider both self-reported cocaine- 
use amount trajectories and dichotomized use trajectories. Variations 
within the generalized longitudinal trajectories are modeled through 
a latent Gaussian process, which is characterized by a few leading 
functional principal components. The association between the base¬ 
line longitudinal trajectories and the time to first relapse is built upon 
the latent principal component scores. The mean and the eigenfunc¬ 
tions of the latent Gaussian process as well as the hazard function 
of time to first relapse are modeled nonparametrically using penal¬ 
ized splines, and the parameters in the joint model are estimated by 
a Monte Carlo EM algorithm based on Metropolis-Hastings steps. 

An Akaike information criterion (AIC) based on effective degrees of 
freedom is proposed to choose the tuning parameters, and a modified 
empirical information is proposed to estimate the variance-covariance 
matrix of the estimators. 

1. Introduction. In cocaine dependence research, it has been shown that 
one’s baseline cocaine-use pattern is related to the risk of posttreatment co¬ 
caine relapse [Fox et al. (2006)], along with many other factors such as co¬ 
caine withdrawal severity, stress and negative mood [Kampman et al. (2001), 
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Sinha (2001, 2007)]. The timeline follow-back (TLFB) [Sobell and Sobell 
(1993)1 Substance Use Calendar is often used to retrospectively construct 
trajectories of daily cocaine use in a baseline period before treatment. The 
TLFB uses a calendar prompt and many other memory aids (e.g., the use of 
key dates such as holidays, birthdays, newsworthy events and other personal 
events as anchor points) to enhance the accuracy of self-report substance-use 
estimates. Fals-Stewart et al. (2000) showed that the TLFB could provide 
reliable daily cocaine-use data that had high retest reliability, high corre¬ 
lation with other cocaine-use measures and high agreement with collateral 
informants’ reports of patients’ cocaine use as well as results obtained from 
urine assays. 

Based on the self-reported daily cocaine-use trajectories, certain sum¬ 
mary statistics can be derived and are often used as predictors in a subse¬ 
quent analysis to explain cocaine relapse outcomes. Commonly used sum¬ 
mary statistics include baseline cocaine-use frequency and average daily use 
amount, and commonly used relapse outcome measures are time to relapse 
(i.e., time to first cocaine use), frequency of use and quantity of use per oc¬ 
casion during the follow-up period [Carroll et al. (1993), Sinha et al. (2006)]. 
Among the different relapse outcome measures, time to first relapse (which 
we also refer as “relapse time” for ease of exposition) is of particular clinical 
importance because it signals the transition of a cocaine-use pattern from 
abstinence to relapse. Sinha et al. (2006) examined time to cocaine relapse 
using Cox proportional hazards regression models. They concluded that the 
amount of cocaine used per occasion during the 90 days prior to inpatient 
admission was significantly associated with relapse time. Guan, Li and Sinha 
(2011) argued that because the baseline cocaine-use trajectories were ran¬ 
dom, summary statistics derived from them were only estimates of one’s 
long-term cocaine-use behavior and could be subject to large measurement 
error. In a regression setting, the use of error-prone variables as predictors 
may cause severe bias to the regression coefficients [Carroll et al. (2006)]. 
To mitigate the bias, Guan, Li and Sinha (2011) proposed a method-of- 
moments-based calibration method for linear regression models and a sub¬ 
sampling extrapolation method that is applicable to both linear and nonlin¬ 
ear regression models. However, their methods require a restrictive assump¬ 
tion that the baseline cocaine-use trajectories are stationary processes, and 
their subsampling extrapolation method is an approximation method which 
cannot completely eliminate the estimation bias in survival analysis. 

We propose a new modeling framework to link one’s baseline cocaine- 
use pattern to relapse time without assuming stationarity for the baseline 
cocaine-use trajectories. We treat the baseline cocaine-use trajectories as 
functional data [Ramsay and Silverman (2005)] and perform functional prin¬ 
cipal component analysis (FPCA) to these trajectories. The resulting FPCA 
scores are then used as predictors to model relapse time. We develop a joint 
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modeling approach to conduct FPCA and functional regression analysis si¬ 
multaneously. We consider two types of baseline cocaine-use trajectories: the 
first is the actual self-reported daily cocaine-use amount as provided by the 
TLFB, whereas the second is a dichotomized version of the first in the form 
of any cocaine use versus no use. The actual daily cocaine-use amount can 
be difficult to estimate depending on the length of the recalling period and 
also due to the lack of a common scale to assess the amount used for the 
different methods of consumption (e.g., intranasal use versus injection). The 
dichotomized cocaine-use trajectories, although maybe less informative, are 
subject to smaller errors and hence are more reliable. 

There is a large volume of recent work on FPCA. See Yao, Muller and 
Wang (2005a), Hall, Miiller and Wang (2006), Li and Hsing (2010) for kernel- 
based FPCA approaches, and James, Hastie and Sugar (2000), Zhou, Huang 
and Carroll (2008, 2010) for spline-based FPCA methods. All these papers 
are concerned with the Gaussian type of functional data and cannot be 
used for generalized longitudinal trajectories. Hall, Muller and Yao (2008) 
proposed to model non-Gaussian longitudinal data by generalized linear 
mixed models, where the FPCA can be performed with respect to some 
latent random processes. Once the FPCA scores are obtained, a common 
approach is to use them as predictors in a second-stage regression analysis 
[e.g., Crainiceanu, Staicu and Di (2009), Yao, Muller and Wang (2005b)]. 
As pointed out in Li, Wang and Carroll (2010), a potential problem with 
such an approach is that the estimation errors in FPCA are not properly 
taken into account in the second stage regression analysis, hence, the esti¬ 
mated coefficients can be biased and variations in the estimators may be 
underestimated. By performing FPCA and functional regression analysis 
simultaneously, we can avoid these complications. 

Our work is also related to joint modeling of longitudinal data and sur¬ 
vival time [e.g., Ratcliffe, Guo and Ten Have (2004), Wulfsohn and Tsiatis 
(1997), Yan and Fine (2005), Yao (2007, 2008), Su and Wang (2012)]. How¬ 
ever, the vast majority of the existing literature focuses on the instantaneous 
effect of longitudinal data on survival time. In other words, the hazard rate 
of the event time is only related to the value of the longitudinal process at 
the moment of event. In our problem, the longitudinal trajectories were col¬ 
lected prior to the relapse period and we want to use the entire baseline-use 
trajectory as a functional predictor in the survival analysis. Survival analysis 
with functional predictors is not well studied in the literature compared with 
other functional regression models, and an extra complication in our data 
is that the relapse time is interval censored (see Section 2.1 for details). As 
noted in Cai and Betensky (2003), Sun (2006), one prominent difficulty in 
modeling interval censored survival data is that, unlike right censored data, 
we cannot separate estimating the baseline hazard function from estimating 
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the hazard regression coefficients using approaches such as the partial like¬ 
lihood. Therefore, we propose to model the log baseline hazard function as 
a spline function. Some recent literature on spline models of the log base¬ 
line hazard function for interval censored data includes Cai and Betensky 
(2003), Kooperberg and Clarkson (1997), Rosenberg (1995) and Zhang, Hua 
and Huang (2010). 

2. Data structure and joint model. 

2.1. Description of the motivating data. Our data came from a recently 
completed clinical trial for cocaine dependence treatment. In the study, 
seventy-nine cocaine-dependent subjects were admitted to the Clinical Neu¬ 
roscience Research Unit (CNRU) of the Connecticut Mental Health Center 
to receive an inpatient relapse prevention treatment for cocaine dependence 
lasting for two to four weeks. The CNRU is a locked inpatient treatment and 
research facility that provides no access to alcohol or drugs and only limited 
access to visitors. Upon treatment entry, all subjects were interviewed by 
means of the Structured Clinical Interview for DSM-IV [First et al. (1995)]. 
Variables collected during the interview include age, gender, race, number 
of cocaine-use years and number of anxiety disorders present at interview, 
among others. The TLFB Substance Use Calendar was used to retrospec¬ 
tively construct daily cocaine-use history in the 90 days prior to admission. 

After completing the inpatient treatment, all participants were invited 
back for follow-up interviews to assess cocaine-use outcomes. Four interviews 
were conducted at days 14, 30, 90 and 180 after the treatment. During 
each interview, daily cocaine-use records were collected using the TLFB 
procedure for the period prior to the interview date. A urine toxicology 
screen was also conducted to verify the accuracy of a reported relapse or 
abstinence. A positive urine sample test would suggest that the subject had 
used cocaine at least once in the reporting period before the positive urine 
test, but the test could not tell the exact cocaine-use date(s). If the self- 
reported relapse time had no conflict with the urine tests, we consider it 
as an observed event time. However, some subjects had reported no prior 
cocaine use before the first positive urine sample test, their relapse times 
were interval censored between their first positive urine test and the previous 
negative test (if there was any). There were also subjects who reported no 
cocaine use nor yielded any positive urine samples for the entire study period. 
For these subjects, their relapse time data were right censored at the last 
interview date. In our data, about 50.6% of the subjects had observed relapse 
time; 31.6% were interval censored and 17.8% were right censored. 

In what follows, let N denote the number of study subjects. For the ith 
subject, let Yi = {Yi(tij),j = 1,..., Ui} be the baseline cocaine-use trajectory, 
Tj be a posttreatment relapse time that may be right or interval censored, 
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and Zi be an m-dimensional covariate vector, where tij is the jth observa¬ 
tion time for the ith subject within the baseline time interval T, nt is the 
total number of such observation time, and Zi includes baseline informa¬ 
tion on age, gender (= 1 for female and 0 for male), race (= 1 for African 
American and 0 for the rest), number of cocaine-use years (Cocyrs) and 
number of anxiety disorders present at the baseline interview (Curanxs). As 
mentioned in the Introduction, we consider two cases that Yi{t) is either the 
self-reported use amount on day t or the dichotomized version. 

2.2. Modeling the baseline longitudinal trajectories. 


2.2.1. Generalized functional mixed model. We assume that the longitn- 
dinal observations Yij = Yi [tij) are variables from the canonical exponential 
family [McCullagh and Nelder (1989)] with a probability density or mass 
function 


( 2 . 1 ) 


f{Yij\eij,(l)) =exp 


«(</>) 


{Yijdi. 


b{6ij)} + c{Yij,(l)) , 


where 9ij is the canonical parameter and 4> is a dispersion parameter. Denote 
Hij as the mean of Yij. Then fiij is the first derivative of b{-) at 9ij, that 
is, iXij = b^^\9ij). The inverse function of b^^\-), denoted as g{-), is called 
the canonical link function. We consider two different types of trajectories: 

Gaussian trajectories where y)^^^(t) = log(0.5-|- reported cocaine use on day 

f2l 

t), and dichotomized trajectories where Yf (t) = 1 if the Th subject used 
cocaine on day t, and = 0 otherwise. For Gaussian longitudinal outcomes, 
9ij = hij f{Yij\9ij,(l)) is the density of Normal(0jj,(/>); in the case of 

dichotomized outcomes, 0) is the binary probability mass function 

with 9ij = logit{P(l^j = 1)} and f> = l. We assume that Yi{t) is driven by a 
latent Ganssian process Xi{t) such that 9ij = Xi{tij) and that Xi{t) yields 
a standard Karhunen-Loeve expansion 

(2.2) Xi{t)=gL{t)+if{tY'f,i forteT, 

where /r(t) = E{Aj(t)} is the mean function, if = (ifi,... ,ijjp)'^ is a vec¬ 
tor of orthonormal functions also known as the eigenfunctions in FPGA, 
fi = (^ii, • ■ • 5 Cip)'^ ~ Normal(0, D^) are the principal component scores, = 
diag(di,..., dp) and di > d 2 > ■ ■ ■ > dp > 0 are the eigenvalues. In theory, the 
Karhunen-Loeve expansion contains an infinite number of terms, and trun¬ 
cating the expansion to a finite order is a finite sample approximation to the 
reality. The nnmber of principal components p becomes a model parameter 
and will be chosen by a data-driven method. 


2.2.2. Reduced-rank model based on penalized B-splines. We approximate 
the unknown functions /i(t) and ip{t) by B-splines [James, Hastie and Sugar 
(2000), Zhou, Huang and Garroll (2008)]. The B-spline representation achieves 
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two goals simultaneously: smoothing and dimension reduction. Smoothing 
is needed because the self-reported cocaine-use amount trajectories contain 
a substantial amount of measurement error. With our spline representation, 
each function is parameterized by a small amount of spline coefficients and 
the estimates are further regularized by a roughness penalty. 

Let ^{t) = be a g-dimensional B-spline basis defined 

on equally spaced knots mT,6^heaqxl vector and 0^ = ... ,6^p) 

he a q X p matrix of spline coefficients, then the unknown functions are 
represented as p{t) = and The general recom¬ 

mendation for choosing q in the penalized spline literature is to choose a 
relatively large number q^ p, and let the smoothness of the estimated func¬ 
tions be regularized by the roughness penalty [Ruppert, Wand and Carroll 
(2003)]. The original B-spline basis functions are not orthonormal, therefore, 
we employ the procedure prescribed by Zhou, Huang and Carroll (2008) to 
orthogonalize them so that f dt = Ig, where Ig is a q x q iden¬ 

tity matrix. Under this construction, the orthonormal constraints on ^(t) 
translate into constraints on the coefficients, that is, = Ip. Then the 

reduced-rank model for the latent process takes the form 

(2.3) Xi{t) = subject to 0^0^ = Ip- 

For the Gaussian trajectories, that is, the log-transformed cocaine-use 
amount, T) = where Bi = . ,^(Lni)'^}^ is the 

design matrix by interpolating the basis functions on the observation time 
points and ~ Normal(0, Ug/nJ. The conditional log-likelihood function for 
the baseline-use trajectories is 


di] 

^Longf'^L ) — / Xhons.i^ 
i=l 

(2.4) 

where • = -'^\og{al) - , 

and . 

For the dichotomized trajectories, log{7rjj/(l — vTjj)} = )0fi + 

{tij)Q^^i, where iTij = P{Yij = Ij^i). The conditional log-likelihood func¬ 
tion is 


(2.5) 


^Long 


N 

(0?')=E4i,i. 

2=1 


where 41ng,i = + (1 “ Vij) log(l “ 


rii 
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and 0^^ = ■ To regularize the nonparametric estimators, 

we impose penalties on the norms of their second derivatives [Eilers and 
Marx (1996), Ruppert, Wand and Carroll (2003)]. Define = f ^"{t) x 

dt, then 

j dt = eljggOp, j dt = OliJggO^i. 

The penalized log-likelihood for the baseline longitudinal data is 

(2.6) ^Long(0I,) - 2 ^ , 

where .^Long is either or -^Lo^g for Gaussian and dichotomized trajecto¬ 
ries, respectively, and and are tuning parameters. 

2.3. Modeling the relapse time. We assume that the relapse time Tj de¬ 
pends on the baseline cocaine-use history Yiit) only through the latent pro¬ 
cess Xi{t). Moreover, the conditional hazard of Tj given {Xi{t),t E T} and 
the covariate vector Zi follows the Cox proportional hazards model. Our way 
of including the functional covariate Xi into survival analysis is closely re¬ 
lated to the functional linear model; see Ramsay and Silverman (2005), Yao, 
Muller and Wang (2005b), Crainiceanu, Staicu and Di (2009), Li, Wang and 
Carroll (2010) and many others. More specifically, the conditional hazard 
function of Tj is 

Xi{t\Xi,Zi) = Ao(t)exp|y' Xi{s)^{s) ds + zf-q^, 

where Xo{t) is an unknown baseline hazard function, ry is a coefficient vector 
and lB(s) is an unknown coefficient function. When X has the Karhunen- 
Loeve expansion in (2.2), the coefficient function can be written as a linear 
combination of the eigenfunctions lB(s) = /3jV’i(s) and the integral in 

the model can be simplified as fj-Xi(s)^(s) ds = which moti¬ 

vates the model 

(2.7) Ai(t|^i,Zi) = Ao(t)exp(4^/3-LZfr?). 

One important feature of the cocaine dependence treatment data is that 
the relapse time is partially interval censored. That is, the data are a mixture 
of noncensored, right censored and interval censored data. For the subjects 
with interval censoring, we only know that the relapse time occurred within 
an interval [T/,T7], where T/ < T[. We adopt the idea of Cai and Betensky 
(2003) and model the log baseline hazard as a linear spline function 

K 

log{Ao(t)} = oo + ^ lbfc(i - Kk)+, 

k=l 


( 2 . 8 ) 
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where x+ = max(x,0) and are the knots. The spline basis used in (2.8) 
is also known as the truncated power basis [Ruppert, Wand and Carroll 
(2003)]. There are two immediate benefits for this model. First, Ao(-) is 
guaranteed to be nonnegative, so that we do not have to consider any con¬ 
straints on the parameters when maximizing the likelihood. Second, since 
logAo(-) is modeled as a piecewise linear polynomial, the cumulative haz¬ 
ard function Ao(t) = /g Ao(u)du can be written out in an explicit form. For 
higher order spline functions, such explicit expressions are not available. 

To write out the likelihood for the relapse time, we use the following 
notation. For the Rh subject we observe {T^,T[,Si), where [T-,T[] gives the 
censoring interval and 6i is the indicator for right censoring. When = 0 
and T- = T [, the event time Ti is right censored at Tf ; when <5* = 1 and 
T- < T[, Ti is interval censored within when 6i = 1 and T- =T[, 

Ti is observed at T[. In addition, 6oi = I{Si = l,Tj^ = T[) is the indicator 
for noncensored relapse time. Denoting Xj = , Z'f)'^, the conditional log- 

likelihood function for the relapse time is [Cai and Betensky (2003)] 

N 

^Relap(0s) = ^^Relap,i where 
i=l 

(2.9) £Reiap,i = <5oi{logAo(T;0 + (Xf0)} - (1 - <5i)exp(Xf0)Ao(T^") 

-h(5i(l -(5oi)log[exp{Ao(l7’) - exp(Kf 6)], 

and ©5 = (a^, 9'^)'^ is the collection of parameters. 

With the log baseline hazard function expressed as a linear spline func¬ 
tion, the log-likelihood function in (2.9) can be evaluated explicitly. To 
regularize the estimators, one commonly used approach is to model the 
polynomial coefficients o = (cq;®!)^ as fixed effects and the spline coeffi¬ 
cients lb = (bi, b 2 , • • •, bA)^ as random effects with b ~ Normal(0, u^Ir'). 
This mixed model setup leads to a penalized log-likelihood 

(2.10) ^Relap(0s) - 

2<^lb 

Ruppert, Wand and Carroll (2003) recommended to use a relatively large 
number of basis functions in a penalized spline estimator, so that the smooth¬ 
ness of logAo(-) is mainly controlled by Following Cai and Betensky 
(2003), we set K = min([A^/4j, 30), where [xj is the floor of x, and choose 
the knots to be equally spaced with respect to the quantiles defined on the 
unique values of {T/,Tt, (T- + T[)j2,i = 1,..., A^}. The variance parame¬ 
ter is treated as a tuning parameter in our nonparametric estimation. 
When analyzing the survival data alone, Cai and Betensky (2003) proposed 
to select by maximizing the marginal likelihood using a Laplace approx¬ 
imation [Breslow and Clayton (1993)]. Choosing in our joint model is 
more challenging and will be addressed in Section 3.2. 
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2.4. The joint model. The principal component scores of the longitu¬ 
dinal data are also latent frailties in the survival model for the relapse time. 
By imposing a normality assumption, the log-likelihood for ^ is 

N 

(2.11) £Frail(0F) = ^^FrailJj ^FraiM = “^logl-Dgl - 

i=l 

where &f = {di, ■ ■ ■, dp)"’" are the diagonal elements of D^. 

The complete data log-likelihood for the joint model is given by combining 
the parts in (2.6), (2.9) and (2.11) as 

N 

(2.12) ^c(0) = ^ -^LongJ + ^Relap.i + ^FrailJ, 

where 0 = (0j), 0g , and the penalized version of (2.12) is 

£p(0;^,V,T’,T^,6,Z) 

= £c(&) — 2 ^^^^ ~ ^ I i • 

Here Y, T\ , 6 and Z are the vectors or matrices pooling the corre¬ 
sponding variables from all subjects. 

3. Methods. 

3.1. Model fitting by the MCEM algorithm. We £t the joint model by an 
EM algorithm treating the latent variables as missing values. In our algo¬ 
rithm, we £x the tuning parameters h^ and and focus on estimating 
the model parameters 0. Selection of the tuning parameters is deferred to 
Section 3.2. 

The loss function of the EM algorithm is 

(3.1) Q(0; 0™rr) = E{£p(0; T, T', 5, ZjjY, T', 5, Z, 0,urr}, 

where £p is the penalized complete data log-likelihood in (2.13) and 0curr 
is the current value of 0. The algorithm updates the parameters by iter¬ 
atively maximizing (3.1) over 0. Given the complexity of the joint model, 
the conditional expectation in (3.1) does not have a closed form, we there¬ 
fore approximate (5(0;0curr) by Markov Chain Monte Carlo (MCMC). 
Let • ■ •, be MCMC samples from the conditional distribution 

(^ilYi,T-,T[,di,Zi,0curr), and then (5(0;0curr) can be approximated by 
Q(0;0curr) = ^p(0; E, T^, r*', 5, Z). This algorithm is a variant 

of the Monte Carlo EM (MCEM) algorithm of McCulloch (1997), and the de¬ 
tails are provided in Sections A.l and A.2 of supplementary material [Ye, Li 
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and Guan (2015)]. To ensure convergence of the MCMC, we also monitor the 
Monte Carlo error in the E-step using the batch means method of Jones et al. 
(2006). Specifically, we divide the Monte Carlo sequence = 1,... ,R} 

in to batches so that we have replicates of Q{Q;Q^^^) to evaluate the 
Monte Carlo error. 

3.2. Model selection by Akaike information criterion. The most pressing 
model selection issue in our joint model is to select the number of principal 
components p since it determines the structure of the baseline trajectories 
and their association with the relapse time. Another important issue is to 
select the tuning parameters. As mentioned before, as long as we include 
enough of a number of spline bases and place the knots reasonably, the 
performance of the estimated functions is mainly controlled by the penalty 
parameters h^,h^ and We propose to select p, h^,h^ and simulta¬ 
neously by minimizing an Akaike information criterion (AIC), which is the 
negative log-likelihood plus a penalty on the model complexity. 

In our setting, the log-likelihood on observed data requires integrating 
out the latent variables ^ from the complete data likelihood (2.12), which is 
intractable. A commonly used approach is to replace the log-likelihood with 
its conditional expectation given the observed data [Ibrahim, Zhu and Tang 
(2008)]. Hence, the AIC is of the form 

AIC(p, V, al) = -2E{^c(0; T", J, Z)lY, T\T\5, Z, 0} + 2M, 

where the conditional expectation is approximated by a Monte Carlo average 
using the Monte Carlo samples in the last MCEM iteration and M is the 
effective degrees of freedom in the model. 

Eor the longitudinal data, both the mean function p{t) and the eigen¬ 
functions ilj{t) are estimated by penalized splines. Eollowing Wei and Zhou 
(2010), the effective degrees of freedom for a P-spline estimator with a 
penalty parameter h is 

{ / N \ -1 A 

(bjB i+hjssj bjB i 

where h can be either or . Since our model consists of one mean func¬ 
tion and p eigenvalues and eigenfunctions, the effective degrees of freedom 
for the longitudinal data is df(/i^) + p x {df(h^) + 1}. 

Similarly, the effective degrees of freedom for the estimated log baseline 
hazard function can be approximated by [Ruppert, Wand and Carroll (2003)] 


df((T^) = trace 
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where is the design matrix from the truncated power basis used in (2.8). 
For interval censored subjects, we approximate the event time by the mid¬ 
point r™ of the interval [Tl,T[] and the design matrix for the ith subject 

By taking into account the degrees of freedom in all model components, 
the AIC for the joint model becomes 

AIC(p, 

(3.2) = -2E{^c(0; T\T^, 6, Z)\Y,T\T\5, Z, 0} 

-|- 2[df(/i^) p X {df(/i^) -|- 1} + df((T^) + m + p]. 

Searching for the minimum of AIC in a four-dimensional space is extremely 
time consuming. One possible simplification is to assume that the baseline 
mean and eigenfunctions have about the same roughness and set h^ = h^ = 
h. Then for each value of p, we search for the optimal value of h and 
over hve grid points in each dimension. We adopt this search scheme in all 
of our numerical studies and it proves to be computationally feasible. 


3.3. Variance estimation. To make inference on parameters in the joint 
model, we need to estimate the variance-covariance matrix of the estima¬ 
tor 0. Let O = {Y,T^,T^,6, Z) be the observed data. Louis (1982) showed 
that the covariance matrix of 0 can be approximated by the inverse of the 
observed information matrix 


5090T 


O 


(3.3) 


d 




o 


+ E{^<p(e;{,0) 


O 


d 


de^ 


£p(0;C,O) 




where ip is the penalized log-likelihood based on complete data (2.13). We 
can estimate this information matrix by evaluating the partial derivatives at 
the final estimator 0 and replacing the conditional expectations by Monte 
Carlo averages using the Monte Carlo samples generated in the final EM 
iteration. 

0ne important distinction between our model and the generalized lin¬ 
ear mixed models or other joint models is that the eigenfunctions are not 
identihable without the orthonormal constraints in (2.3). Because of the con¬ 
straints, the real number of free parameters in 0^ is lower than the nominal 
dimension. As a result, the information matrix dehned above might be sin¬ 
gular. One solution is to reparameterize 0^ so as to remove the constraints. 
Details are given in supplementary material [Ye, Li and Guan (2015)]. 
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A referee pointed out the methods by Meilijson (1989) and Meng and 
Rubin (1991) can also be used to estimate the asymptotic variance of 0. 
These methods are not only based on observed information, but also evaluate 
the derivatives numerically by running additional Markov chains. It is worth 
pointing out that these methods are designed for the cases where there is 
no constraint on the parameter 0. Extending these methods to our problem 
calls for future research. 

4. Simulation study. We illustrate the performance of the proposed meth¬ 
ods by a simulation study. To mimic the real data, we consider two simula¬ 
tion settings where the baseline longitudinal trajectories are Gaussian and 
binary, respectively. In both settings, we simulate N = 100 independent sub¬ 
jects, with rii = 20 baseline longitudinal observations equally spaced on the 
time interval T = [0,20]. 

Gaussian baseline trajectories are generated as Yi{t) = Xi(t) + ei{t), where 
Xi(t) is the fth realization of a Gaussian process with the Karhunen-Loeve 
expansion (2.2). We let the mean function be /r(f) = f/60 -|- sin(37rt/20), 
the eigenvalues be di = 9, d 2 = 2.25 and dfc = 0 for /c > 3, and the eigen¬ 
functions be ijjiit) = —cos(7rf/10)/VT0, = sin(7rf/10)/\/T0. The princi¬ 
pal component scores are simulated as ~ Normal(0, Hg) with 

= diag(9,2.25). The error e{t) is a Gaussian white noise process with 
variance = 0.49. In the case of the binary baseline, Yij are generated 
from a Bernoulli distribution with the probability g~^{Xi{tij)}, where the 
latent process X is simulated the same way as for the Gaussian baseline 
trajectories and g{TT) = log(j^) for 0 < tt < 1. 

Under both simulation settings, we simulate the failure time Tj from 
the Gox proportional hazards model (2.7), which includes the effects of the 
principal component scores and a covariate Zj. We let Zj be a binary ran¬ 
dom variable with a success probability of 0.5, the regression coefficients be 
9 = {/3'^,7])'^ = (1,1,1)^, and the baseline hazard function be Ao(t) = t/20 
for t>0. We assume that the failure time is interval censored at random 
and set the censoring time to be 4, 10 and 20. Let the censoring indicator 
6i be a binary variable independent of and Zj with P{6i = 1) = 0.5. When 

= 1, the event time Tj is censored in the interval between the two closest 
censoring time; if Tj is less than 4, it is censored in [T- = 0,T1 = 4]; if Ti is 
over 20, it is automatically right censored at 20. Overall, the data structure 
is similar to the cocaine dependence treatment data described in Section 2: 
about 12% of the failure times are right censored, 43% are interval censored, 
and the remaining 45% are observed. 

For both baseline settings, we repeat the simulation 100 times and apply 
the proposed method to fit the joint model. For the results reported below, 
we use g = 8 cubic B-splines to model the mean and eigenfunctions of the 
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latent longitudinal process and K = 12 spline basis functions to model the 
log baseline hazard function. Our experience and those of many others [e.g., 
Cai and Betensky (2003), Ruppert, Wand and Carroll (2003), Zhou, Huang 
and Carroll (2008)] suggest that the performance of penalized spline esti¬ 
mators is mainly controlled by the penalty parameters and is not sensitive 
to the choice of spline basis. 

To choose the number of principal components p and the penalty param¬ 
eters /i^, and we conduct a grid search using the proposed AIC (3.2). 
For all the simulations, the AIC selects the correct number p = 2 of principal 
components about 77% of the time and selects p = 3 for the remaining 23% of 
the time. Since AIC has a well-known tendency to select an over-fitted model 
and over-fitting is in general considered less problematic than under-htting, 
this performance is quite satisfactory. For the estimation results below, we 
use the penalty parameters selected by AIC when p is hxed at 2. 

We summarize in Figures 1 and 2 the nonparametric estimators when 
the baseline longitudinal trajectories are Gaussian and binary, respectively. 



Fig. 1. Summary of the nonparametric estimators in the simulation study when the 
baseline longitudinal trajectories are Gaussian. The four panels correspond to ipi{t), ip 2 {t), 
fl{t) and the log baseline hazard function, respectively. In each panel, the dotted curve is the 
true function, the solid curve is the median of the estimator, the dash-dot and dashed curves 
are the 5% and 95% pointwise percentiles, (a) 1st eigenfunction, (b) 2nd eigenfunction. 
(c) Baseline mean function, (d) Log baseline hazard function. 
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(b) 




Fig. 2. Summary of the nonparametric estimators in the simulation study when the 
baseline longitudinal trajectories are binary. The four panels correspond to ijiift), 4>2{t), 
f2{t) and the log baseline hazard function, respectively. In each panel, the dotted curve 
is the true function, the solid curve is the median of the estimator, the dash-dot and 
dashed curves are the 5% and 95% pointwise percentiles, (a) 1st eigenfunction, (b) 2nd 
eigenfunction, (c) Baseline mean function, (d) Log baseline hazard function. 


Each figure contains four panels that summarize '02 (Oi the 

log baseline hazard function. We show in each panel the true curve, the 
median, and the 5th and 95th pointwise percentiles of the estimators. As we 
can see, the spline estimators perform very well in both simulation settings, 
and the median and the pointwise percentiles of the estimated curves are 
very close to the truth. Between the two types of baseline longitudinal data, 
binary trajectories are less informative, and hence the estimated curves are 
more variable. For instance, the integrated mean squared error for the two 
eigenfunctions are 0.0072 and 0.0150 in the Gaussian case and are 0.0462 
and 0.1206 in the binary case. The true log hazard function is log(t/20), 
which is —oo at t = 0; this explains the bigger bias of our spline estimator 
near 0. The bias in the nonparametric part has little effect on estimation of 
the parametric components such as 9. 

We summarize the estimation results of the parametric components for 
both settings in Table 1, where we show the means and Monte Carlo stan¬ 
dard deviations of the estimators. As we can see, the estimators for the 
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Table 1 

Estimation results of the parametric components under both simulation settings, with 
either Gaussian or binary baseline trajectories. Presented in the table are the true value 
of the parameters, mean and Monte-Carlo standard deviations (Stdev) of the estimated 
parameters, and the mean of the estimated standard error using the Louis formula 
(Stder). The joint modeling method (joint) is the proposed method, and the two-stage 
method is by plugging estimated FPCA scores into a second stage survival analysis 


Method 

Parameter 

/3i 

(32 

V 

di 

d2 


Gaussian baseline trajectory 







Two-stage 

True 

1.0000 

1.0000 

1.0000 

9.0000 

2.2500 

0.4900 


Mean 

0.8154 

0.8092 

0.7972 

8.9248 

2.0224 

0.4443 


Stdev 

0.0911 

0.1513 

0.3302 

1.1193 

0.3183 

0.0147 

Joint 

Mean 

0.9824 

1.0130 

0.9782 

9.1184 

2.0861 

0.4839 


Stdev 

0.1253 

0.1926 

0.3885 

1.1558 

0.3349 

0.0157 


Stder 

0.1184 

0.1593 

0.3469 

1.3661 

0.3633 

0.0154 

Binary baseline trajectory 







Two-stage 

True 

1.0000 

1.0000 

1.0000 

9.0000 

2.2500 



Mean 

0.8187 

0.6681 

0.4642 

6.4365 

2.1158 



Stdev 

0.1759 

0.4840 

0.2658 

1.1685 

0.5384 


Joint 

Mean 

0.9798 

0.9890 

0.9997 

9.3307 

2.2823 



Stdev 

0.1380 

0.1727 

0.3724 

1.9894 

0.8342 



Stder 

0.1192 

0.1553 

0.3412 

2.0035 

0.6059 



parametric components are approximately unbiased and the standard de¬ 
viations are reasonably small. We also present the means of the estimated 
standard errors using the modified empirical information in Section 3.3, and 
find that the standard errors slightly underestimate the true standard devia¬ 
tions. This underestimation of standard error is quite common in semipara- 
metric models under small sample sizes, since the standard error is based 
on an estimate of the asymptotic variance, which only captures the leading 
term in the asymptotic distribution of the point estimator [Lin and Carroll 
( 2001 )]. 

To demonstrate the advantage of the joint modeling approach, we also pro¬ 
vide a comparison between our method and a two-stage functional survival 
analysis approach, where we perform FPCA to the longitudinal trajectory 
first and then use the estimated principal component scores as predictors 
in the second-stage survival analysis. For Gaussian longitudinal trajectories, 
the FPC scores are estimated by the principal analysis by the conditional 
expectation (PACE) method [Yao, Muller and Wang (2005a)]; for the di¬ 
chotomized trajectories, the FPC scores are estimated by the method of 
Hall, Muller and Yao (2008) which is implemented in a PACE-GRM pack¬ 
age in Matlab. The estimation results of the two-stage estimator are also 
provided in Table 1. We can see that the two-stage estimators for (3 and rj 
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are severely biased. This bias is the result of the attenuation effect caused 
by the estimation errors in the FPC scores. 

5. Cocaine dependence treatment data. We apply our proposed joint 
modeling approach to analyze the cocaine dependence treatment data de¬ 
scribed in Section 2. For the baseline cocaine-use trajectories, we consider 
both the (log-transformed) cocaine-use amount trajectories and the dicho¬ 
tomized trajectories. Relapse time is determined from the self-reported post¬ 
treatment cocaine-use trajectories as well as the urine sample tests. As we 
discussed in Section 2, the relapse time is partially interval/right censored. 
We use the hve covariates described in Section 2 in the Cox model, that is, 
age, gender, race, Cocyrs and Curanxs. To capture potential weekly periodic 
patterns of the baseline trajectories, we aligned the baseline trajectories by 
weekdays such that all trajectories start from the first Sunday of the baseline 
period and last for 80 days. 

We use 30 cubic B-spline basis functions to model the mean and eigen¬ 
functions of the baseline trajectories so that there are about two knots within 
each weak and the basis functions are flexible enough to capture possible 
weekly patterns in the data. The smoothness of these nonparametric esti¬ 
mators are governed by the data-driven tuning parameters. We use 12 linear 
spline basis functions to model the baseline hazard function, similar to the 
choice in Guan, Li and Sinha (2011). We choose the number of principal 
components and the penalty parameters h^,h^ and cr^ by the proposed 
AIC. The AIC selects three principal components for both types of baseline 
trajectories. The estimated eigenvalues are 16.1960, 2.2097 and 0.8673 for 
the cocaine-use amount trajectories and 61.3838, 0.8986 and 0.1695 for the 
dichotomized trajectories. 

We show the estimated mean and eigenfunctions for the cocaine-use 
amount trajectories in Figure 3 and for the dichotomized trajectories in 
Figure 4. The curves estimated from the two types of trajectories exhibit 
rather similar patterns, and they all show clear weekly periodic structures— 
the baseline trajectories contain 11 weeks of data and these curves have 11 
peaks and troughs matching the weekdays rather closely. If we look beyond 
the local periodic structures and focus on the overall trend of these curves 
over the entire baseline period, we can see that the mean functions are rea¬ 
sonably flat except near the beginning and the end of the baseline period. 
The overall trend in the first eigenfunction is a negative constant function. 
Increasing the loading on the first principal component leads to less cocaine 
use (or lower use probability for dichotomized trajectories), and hence the 
score on the first principal component represents the overall use amount 
(or probability) of a patient. The second principal component represents an 
overall decreasing trend in use amount (or probability) over the recall pe¬ 
riod. The third principal component is a higher order nonlinear trend in the 
trajectories. 
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(a) (b) 



Fig. 3. The mean function and the first three eigenfunctions for the cocaine-use amount 
trajectories, (a) Mean function, (b) 1st eigenfunction, (c) 2nd eigenfunction, (d) 3rd eigen¬ 
function. 

To confirm that the weekly structures in these curves are real, we also 
provide pointwise standard error bands in the plots. Since our simulation 
study shows that the standard error based on the Louis formula underesti¬ 
mates the true standard deviation under a small sample size, we estimate 
the standard error using a bootstrap procedure instead. In our bootstrap 
procedure, we resample the subjects with replacement, fit the joint model 
to the bootstrap samples using the same tuning parameters as for the real 
data, and estimate the standard deviations of the estimators using their 
bootstrap replicates pointwisely. The confidence bands in Figures 3 and 4 
are based on 100 bootstrap replicates. These confidence bands confirm that 
the weekly structures in the eigenfunctions are real. Note that the confidence 
bands in Figure 4 are wider than those in Figure 3 because the dichotomized 
trajectories are less informative. 

The estimated regression coefficients for the Cox model and the corre¬ 
sponding standard errors and p-values are reported in Table 2. The standard 
errors are obtained by bootstrap with 100 replicates. For both types of base¬ 
line trajectories, the second principal component has a significant positive 
effect on the hazard rate of relapse time. This suggests that patients with a 
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(a) 


(b) 




Fig. 4. The mean function and the first three eigenfunctions for the latent process of the 
dichotomized trajectories, (a) Mean function, (b) 1st eigenfunction, (c) 2nd eigenfunction. 
(d) 3rd eigenfunction. 


decline in recent cocaine-use amount or probability relapsed faster. Subjects 
who experienced such a decline might have established a longer period of 
abstinence before entering treatment than those who did not. As a result, it 
would not be surprising for the onset of their cocaine withdrawal symptoms 
to start sooner; this could have in turn caused a faster relapse. Among the 
covariates, Cocyrs is significant, suggesting subjects who had used cocaine 
for fewer years tended to relapse later. 

For comparison purposes, we also report in Table 2 the estimation result 
of the two-stage procedure described in Section 4. In this procedure, FPCA 
and survival analysis are done in successive steps, and the estimation errors 
in the estimated principal component scores are not properly taken into 
account in the survival analysis. It is not surprising that the estimation 
coefficients for the principal component scores by the two-stage procedure 
are attenuated and none of them are significant. 

Following a referee’s suggestion, we have also performed PCA to the use 
amount trajectories without B-spline representation and roughness penalty 
regularization and use the PC scores in the survival analysis. The esti¬ 
mated Cox regression coefficients for the first three principal components 
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Table 2 

Cocaine data analysis under the joint model using either the cocaine-use amount 
trajectories (Amnt.) or the dichotomized use trajectories (Dich.). The table shows the 
estimated coejficients for the variable and five covariates. Cocyrs and Curanxs denote 
the number of cocaine-use years and the number of current anxiety symptoms at baseline 
interview, respectively. “Stder” is the estimated standard error, which is calculated under 
bootstrap in the joint model. The p-value with * indicates significance at a = 0.05 level 


Amnt. 


^2 

^3 

Gender 

Race 

Age 

Cocyrs 

Curanxs 

Two-stage estimator 







Est 

0.0418 

0.1616 

-0.2251 

-0.3818 

-0.4081 

-0.0467 

0.1182 

0.2664 

Stder 

0.0316 

0.0995 

0.1590 

0.2986 

0.3305 

0.0276 

0.0347 

0.2584 

p-value 

0.1870 

0.1046 

0.1570 

0.2011 

0.2169 

0.0908 

0.0007* 

0.3024 

Joint model 








Est 

0.0420 

0.1802 

-0.2021 

-0.3255 

-0.3343 

-0.0449 

0.1098 

0.2348 

Stder 

0.0352 

0.0867 

0.2394 

0.3462 

0.2591 

0.0342 

0.0407 

0.2109 

p-value 

0.2327 

0.0377* 

0.3985 

0.3471 

0.1969 

0.1895 

0.0070* 

0.2655 

Dich. 


^2 

^3 

Gender 

Race 

Age 

Cocyrs 

Curanxs 

Two-stage estimator 







Est 

0.0008 

0.0131 

-0.1331 

-0.3538 

-0.2664 

-0.0437 

0.1031 

0.3582 

Stder 

0.0137 

0.0762 

0.1158 

0.2743 

0.2919 

0.0223 

0.0306 

0.2802 

p-value 

0.9552 

0.8636 

0.2501 

0.8030 

0.3613 

0.0500 

0.0007* 

0.2011 

Joint model 








Est 

0.0064 

0.1840 

-0.2344 

-0.3536 

-0.1567 

-0.0408 

0.0947 

0.2431 

Stder 

0.0135 

0.0936 

0.2261 

0.3128 

0.2343 

0.0315 

0.0393 

0.2544 

p-value 

0.6339 

0.0493* 

0.3000 

0.2583 

0.5035 

0.1951 

0.0160* 

0.3392 


are (0.0380,0.0218,-0.0169) with standard errors (0.0562,0.1283,0.1738). 
In other words, none of these PC scores is found to be significantly related 
to the Hrst relapse time. This is because the cocaine-use amount trajectories 
contain a large amount of error (due to self-reporting and converting differ¬ 
ent consumption methods to equivalent grams), and without regularization 
and joint modeling the estimation errors in the PC scores greatly attenuate 
the Cox regression coefficients and reduce statistical power. Such a direct 
PCA approach is not applicable to the dichotomized trajectories. 

In our joint modeling analysis, we also closely monitor the convergence of 
the Markov Chain. We estimate the Monte Carlo error in the hnal EM iter¬ 
ation using the method described in Section 3.1, which is 8.3408 x 10““^ for 
the cocaine-use amount trajectories and 7.8830 x 10“^ for the dichotomized 
trajectories. 

In a previous work, Sinha et al. (2006) analyzed a similar data set and 
concluded that the baseline average cocaine-use amount had a significant 
negative effect on the hazard function of relapse; this implies that those 
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who used less during the baseline period tended to relapse sooner, which 
is counterintuitive. In Guan, Li and Sinha (2011), the authors argued that 
the counterintuitive results could be due to measurement error in the av¬ 
erage use amount. After having accounted for the measurement error, they 
found that the baseline average cocaine-use amount was no longer signifi¬ 
cant. Since the first principal component in our joint model is closely related 
to the baseline average cocaine-use amount, our result further confirms the 
analysis of Guan, Li and Sinha (2011). However, we have also found that 
the subject-specific decreasing trend in the cocaine-use trajectories (i.e., the 
second principal component) is related to faster relapse, while such a finding 
was not made by either Sinha et al. (2006) or Guan, Li and Sinha (2011). 

6. Summary. In studying the relationship between baseline cocaine-use 
patterns and posttreatment time to first cocaine relapse, most existing lit¬ 
erature only makes use of some basic summary statistics derived from the 
cocaine-use trajectories, such as the average use amount and frequency of 
use. These summary statistics are subject to measurement error and cannot 
fully describe the dynamic structure of the baseline trajectories. 

We propose an innovative joint modeling approach based on functional 
data analysis to jointly model the baseline generalized longitudinal trajecto¬ 
ries and the interval censored failure time. Specifically, we model the latent 
process that drives the longitudinal responses as functional data, approxi¬ 
mate the mean and eigenfunctions of the latent process by flexible spline 
basis functions, and propose a data-driven method to determine the number 
of principal components and hence the covariance structure of the longi¬ 
tudinal data. We propose and implement a Monte Garlo EM algorithm to 
fit the model and modified empirical information to estimate the standard 
error of the regression coefficients. Our analysis of the cocaine dependence 
treatment data shows that the relapse time is related to a decreasing trend 
in the cocaine-use behaviors rather than the average use amount. 

Our proposed model can also be used to predict the hrst relapse time 
of the new subject. For a future subject, suppose that we only observe 
his/her baseline cocaine-use amount trajectory {Y*(t),t G T}, then we can 
predict his/her first relapse time T* using an empirical Bayes method. Us¬ 
ing the proposed joint model, we can write out the conditional distribution 
[T*(t),t G 7^, where is the vector of latent principal component 
scores for the new subject. We can use the model parameters estimated 
from the training data set, and run an MGMG to draw samples from this 
conditional distribution. We use the MGMG samples to estimate the poste¬ 
rior distribution of T *, which provides both a point predictor and prediction 
intervals. 

As all Monte Garlo based methods, our methods are computationally 
intense. For the cocaine dependence treatment data, it takes about 25 EM 
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iterations for the algorithm to converge and the running time is about 1.5 
hours using the self-reported use amount trajectories and about 2.5 hours 
using the dichotomized use trajectories. It takes a lot longer to perform 
model selection and bootstrap, since we have to fit the model many times. 
However, we argue that the computation time is a worthy price to pay in 
exchange for unbiased estimates and correct statistical inference. One of our 
future research directions is to accelerate the EM algorithm using graphics 
processing units (GPU) and parallel computing. 
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SUPPLEMENTARY MATERIAL 

Supplement A (DOI: 10.1214/15-AOAS852SUPP; .pdf). The online sup¬ 
plementary material for this paper contains the technical details of the 
MCEM algorithm to ht the model, estimation of the covariance matrix of 
the estimator, additional simulation results and sensitivity analysis in the 
real data analysis. 
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